Skip to content

fix: replace O(n²) string concat with array buffering for stdout/stderr#1271

Closed
beran-t wants to merge 3 commits intomainfrom
stdout-memory-leak-fix
Closed

fix: replace O(n²) string concat with array buffering for stdout/stderr#1271
beran-t wants to merge 3 commits intomainfrom
stdout-memory-leak-fix

Conversation

@beran-t
Copy link
Copy Markdown
Contributor

@beran-t beran-t commented Apr 14, 2026

Summary

Replaces this._stdout += out / self._stdout += out with array-based buffering (push + lazy join) in both JS and Python SDKs to eliminate quadratic memory amplification when accumulating large command output.

Related: e2b-dev/code-interpreter#251

Problem

JavaScript string += is O(n) per append (copies the entire existing string). When V8's cons-string optimization is defeated (e.g. by calling .indexOf() or .length on the accumulator between appends), this becomes O(n²) total — causing massive memory amplification and event loop stalls.

Changes

  • JS SDK (packages/js-sdk/src/sandbox/commands/commandHandle.ts): Replace _stdout/_stderr string fields with _stdoutChunks/_stderrChunks arrays + cached lazy join() in getters. Cache is invalidated on each new chunk push.
  • Python async SDK (packages/python-sdk/e2b/sandbox_async/commands/command_handle.py): Same pattern — list[str] + ''.join() in properties and at command end.
  • Python sync SDK (packages/python-sdk/e2b/sandbox_sync/commands/command_handle.py): Same pattern.

Test plan

  • pnpm run build — passes
  • pnpm run typecheck — passes
  • pnpm run lint — passes
  • pnpm run test — 299/299 pass (28 volume test failures are pre-existing, unrelated)

…ut/stderr

Replace `this._stdout += out` with array-based buffering (`push` + lazy
`join`) in JS and Python SDKs to eliminate quadratic memory amplification
when accumulating large command output.

Fixes e2b-dev/code-interpreter#251
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

Package Artifacts

Built from c6866bf. Download artifacts from this workflow run.

JS SDK (e2b@2.19.1-stdout-memory-leak-fix.0):

npm install ./e2b-2.19.1-stdout-memory-leak-fix.0.tgz

CLI (@e2b/cli@2.9.1-stdout-memory-leak-fix.0):

npm install ./e2b-cli-2.9.1-stdout-memory-leak-fix.0.tgz

Python SDK (e2b==2.20.0+stdout-memory-leak-fix):

pip install ./e2b-2.20.0+stdout.memory.leak.fix-py3-none-any.whl

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 14, 2026

⚠️ No Changeset found

Latest commit: 438a9b6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8cefe96f5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Command stdout output.
"""
return self._stdout
return "".join(self._stdout_chunks)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cache joined async output between chunk updates

The new stdout/stderr properties now call "".join(...) on every access, which re-copies the full buffered output each time even when no new chunk arrived. This is a regression from the previous O(1) getter behavior and can reintroduce quadratic CPU/memory churn for callers that poll cmd.stdout while a process is running; adding a cached joined value (invalidated on append) would avoid this.

Useful? React with 👍 / 👎.

Comment thread packages/js-sdk/tests/memory-repro.mts Outdated
sys.stdout.flush()
"`,
{
timeout: 600,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the correct timeout option in memory repro

This script passes timeout: 600, but the JS command API consumes timeoutMs, so this value is ignored and the default 60s timeout is used. For large-output runs near a minute, that makes the repro flaky and can invalidate the benchmark conclusions by timing out unexpectedly.

Useful? React with 👍 / 👎.

Comment thread packages/js-sdk/tests/memory-repro.mts Outdated
Comment on lines +56 to +57
onStdout: () => {
chunkCount++
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Actually trigger flattening in the repro callback

The script states it is "reading .stdout in callback" to force flattening, but the callback only increments chunkCount. As written, the repro does not exercise the claimed flattening path, so results from this script cannot validate the specific regression/fix scenario described in the commit.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 packages/js-sdk/tests/memory-repro.mts:55-58 — The onStdout callback in memory-repro.mts only does chunkCount++ and never accesses cmd.stdout, contradicting the comments at lines 26 and 30 that claim it forces V8 string flattening. As a result, running this script only measures the 'normal usage' scenario (427 MB / 2.2x / 15.2s), not the 'with string flattening' scenario (669 MB / 3.5x / 59.5s) that the PR description attributes to it. To reproduce the critical benchmark, the callback should call cmd.stdout or cmd.stdout.indexOf('\n').

    Extended reasoning...

    What the bug is

    The memory reproduction script at packages/js-sdk/tests/memory-repro.mts contains misleading comments. Line 26 states // Use start() + reading .stdout in callback to force V8 string flattening and lines 29-31 print Reading .stdout in callback to force V8 string flattening (simulates readLines indexOf behavior).... However, the actual onStdout callback at lines 56-58 is simply:

    onStdout: () => {
      chunkCount++
    },

    It never accesses cmd.stdout at all.

    The specific code path

    V8's cons-string optimization lazily defers concatenation until the string is actually read. When user code accesses .stdout (e.g. via .indexOf() as code-interpreter's readLines does), V8 must flatten the cons-string tree into a single contiguous buffer — this is the expensive O(n²) operation the fix targets. Without that access in the callback, V8 can keep the cons-string tree intact throughout the command's lifetime, which is a different (and cheaper) memory profile.

    Why existing code doesn't prevent it

    Nothing enforces that the callback actually accesses cmd.stdout. The comments and console.log are purely documentation and have no effect on behavior. The script compiles and runs successfully either way, so the mismatch is invisible unless you read both the comments and the callback implementation carefully.

    Step-by-step proof

    1. The PR description's benchmark table has two distinct post-fix rows:
      • After fix (with string flattening): 200 MB → 669 MB / 3.5x / 59.5s
      • After fix (normal usage): 200 MB → 427 MB / 2.2x / 15.2s
    2. "String flattening" is defined in the PR as accessing cmd.stdout.indexOf('\n') in the onStdout callback.
    3. The committed callback does not call cmd.stdout at any point.
    4. Therefore the script cannot reproduce the 669 MB / 3.5x / 59.5s row — it only measures the 427 MB / 2.2x / 15.2s (normal usage) row.
    5. Any benchmark numbers produced by running this script as committed correspond to the normal usage scenario, not the string flattening scenario referenced in the inline console.log.

    Impact

    This is a nit-level issue limited to the benchmark/test script. The production fix in commandHandle.ts is correct and unaffected. However, the most important real-world scenario (callbacks that call cmd.stdout to scan for newlines, as readLines in code-interpreter does) cannot be verified with this script as committed.

    How to fix

    Change the onStdout callback to access cmd.stdout:

    onStdout: () => {
      chunkCount++
      void cmd.stdout.indexOf('\n') // force V8 cons-string flattening
    },

    This mirrors what code-interpreter's readLines actually does and will produce benchmark numbers consistent with the 'with string flattening' row in the PR description.

Comment on lines 41 to 52
"""
Command stdout output.
"""
return self._stdout
return "".join(self._stdout_chunks)

@property
def stderr(self):
"""
Command stderr output.
"""
return self._stderr
return "".join(self._stderr_chunks)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The Python async SDK's stdout and stderr properties call ''.join(self._stdout_chunks) on every access with no caching, reintroducing the same O(n²) pattern the PR set out to fix. If a user accesses handle.stdout inside an on_stdout callback (as code-interpreter's readLines does), each callback invocation triggers a full re-join of all accumulated chunks — O(total_size) per callback, O(n²) total. Fix by adding _stdout_cached: Optional[str] = None, returning the cached value in the getter, and setting it to None on each append, mirroring the JS implementation.

Extended reasoning...

What the bug is: The Python async SDK's stdout and stderr properties in command_handle.py (lines 41–52) always call ''.join(self._stdout_chunks) on every property access with no caching. The JS SDK correctly maintains _stdoutCached/_stderrCached fields that are set to undefined on each chunk push and lazily computed in the getter. The Python async SDK received the array-buffering change (replacing +=) but did not receive the corresponding caching layer.

The specific code path: When a user passes on_stdout to sandbox.commands.start(), the async SDK calls that callback for each chunk in _handle_events(). If the callback (or any downstream consumer) accesses handle.stdout to get the accumulated output — exactly the pattern code-interpreter's readLines uses to scan for newlines via str.find() — each callback invocation hits the property getter, which calls ''.join(self._stdout_chunks) from scratch.

Why existing code doesn't prevent it: The PR replaced self._stdout += out (O(n) per append) with self._stdout_chunks.append(out) (O(1) per append), which is correct. But it did not add a cache guard in the getter. Before the PR, reading self._stdout in a callback was O(1) (a simple attribute read). After the PR, reading handle.stdout in a callback is O(accumulated_size) because join() must traverse all chunks every time.

The impact: With N chunks totaling T bytes, accessing handle.stdout once per callback produces O(T) work per call and O(N × T) total — quadratic in T when N grows proportionally to T (fixed chunk sizes). The PR's own benchmark documents this scenario ('string flattening') measuring 669 MB peak heap / 3.5x amplification for 200 MB of output in the JS SDK after the fix. The Python async SDK has no protection at all, so this scenario will be at least as bad as the pre-fix JS numbers.

How to fix: Mirror the JS implementation exactly. Add self._stdout_cached: Optional[str] = None and self._stderr_cached: Optional[str] = None as instance variables in __init__. In the stdout property getter, return self._stdout_cached if it is not None, otherwise compute ''.join(self._stdout_chunks), store it in self._stdout_cached, and return it. In _iterate_events, after self._stdout_chunks.append(out), set self._stdout_cached = None. Apply the same pattern for stderr.

Step-by-step proof: Suppose a user runs sandbox.commands.start(cmd, on_stdout=lambda _: print(len(handle.stdout))) against a command producing 200,000 chunks of ~1,000 bytes each (200 MB total, matching the PR's benchmark). Chunk 1 arrives → _stdout_chunks has 1 element → handle.stdout joins 1 string, ~1,000 bytes of work. Chunk 2 → joins 2 strings, ~2,000 bytes. Chunk k → joins k strings, ~k×1,000 bytes. Total work = 1,000 × (1 + 2 + … + 200,000) ≈ 2×10¹³ bytes of string copying — clearly O(n²). This is the exact scenario the PR benchmarks, and the Python async SDK has no defense against it.

Comment on lines 37 to 44
self._handle_kill = handle_kill
self._events = events

self._stdout: str = ""
self._stderr: str = ""
self._stdout_chunks: list[str] = []
self._stderr_chunks: list[str] = []

self._result: Optional[CommandResult] = None
self._iteration_exception: Optional[Exception] = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 The sync CommandHandle has no public stdout or stderr properties, while the async AsyncCommandHandle exposes them via @property. This is a pre-existing API inconsistency: before this PR the sync SDK stored private _stdout: str, and after this PR it stores private _stdout_chunks: list[str] — neither exposes a public accessor. Users holding a background sync handle cannot inspect accumulated output via handle.stdout and would receive AttributeError.

Extended reasoning...

What the bug is

The async AsyncCommandHandle has always exposed stdout and stderr as @property accessors returning ''.join(self._stdout_chunks). The sync CommandHandle has never had equivalent public properties — before this PR it had self._stdout: str = '' (private), and after this PR it has self._stdout_chunks: list[str] = [] (still private). There is no public stdout or stderr attribute on the sync handle.

The specific code path

In packages/python-sdk/e2b/sandbox_sync/commands/command_handle.py, the class defines only a pid property. The _stdout_chunks and _stderr_chunks fields added by this PR at lines 42-43 are purely internal. The async counterpart in packages/python-sdk/e2b/sandbox_async/commands/command_handle.py defines @property stdout and @property stderr that are part of the documented public API.

Why existing code does not prevent it

The PR only changes internal storage from str to list[str] and correctly wires up the CommandResult returned by wait(). It does not add or remove any public properties. The public API surface of the sync handle is unchanged before and after this PR.

Addressing the refutation

The refutation argues this asymmetry is intentional: the sync SDK processes events lazily (only when iterated), so accumulated output mid-run is less meaningful, and users can use wait(on_stdout=...) callbacks or iterate via for stdout, stderr, pty in handle:. This is architecturally accurate. However, a user who uses both SDKs or who reads the async SDK docs would reasonably expect handle.stdout to work on either handle type. The AsyncCommandHandle treats it as standard public API, creating a discoverable inconsistency. The refutation is correct that this is intentional design, but it does not eliminate the confusion users encounter when the APIs diverge.

Impact

Any code that does handle = sandbox.commands.run(cmd, background=True) followed by handle.stdout will raise AttributeError: 'CommandHandle' object has no attribute 'stdout'. The fix would be trivially additive: @property stdout: return ''.join(self._stdout_chunks).

Step-by-step proof

  1. User calls handle = sandbox.commands.run('sleep 5 && echo done', background=True)
  2. User tries print(handle.stdout) to check intermediate output (as they would with AsyncCommandHandle)
  3. Python raises AttributeError: 'CommandHandle' object has no attribute 'stdout'
  4. Same code with the async SDK works fine because @property stdout is defined on AsyncCommandHandle
  5. This is pre-existing: even before this PR, step 3 would raise the same error — the private _stdout field was never a public property

Pre-existing nature

Before this PR: self._stdout: str = '' — private, no public property. After this PR: self._stdout_chunks: list[str] = [] — still private, still no public property. The inconsistency predates this PR.

Mirror the JS SDK's caching pattern — avoid re-joining all chunks on
every property access when no new data has arrived.
@beran-t beran-t closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant